Graph Neural Networks (GNN)
| Vertex | Edge | |
|---|---|---|
| People | like each other | undirected |
| People | is the boss of | directed |
| Tasks | cannot be processed at the same time | undirected |
| Computers | have a direct network connection | undirected |
| Airports | planes flies between them | directed |
| City | can travel between them | directed |
Undirected Graph vs. Directed Graph
Weighted Graph
Cycle vs. Acyclic Graph
Graph and Adjacency Matrix
Convolution Layer
Weight Sharing
Update for Each Node
Matrix Computation for Each Layer
Node-wise summation
$$ Z_G = \tau \left(\sum_{i \in G} MLP \left(H_i^{(L)} \right) \right) $$Task 1: Node classification
Task 2: Edges prediction
Task 3: Graph classification
# !pip install networkx
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import fractional_matrix_power
%matplotlib inline
G = nx.Graph(name = 'G')
for i in range(6):
G.add_node(i, name = i)
edges = [(0, 1), (0, 2), (1, 2), (0, 3), (3, 4), (3, 5), (4, 5)]
G.add_edges_from(edges)
print('Graph Info:\n', nx.info(G))
print('\nGraph Nodes: ', G.nodes.data())
nx.draw(G, with_labels = True, font_weight = 'bold')
plt.show()
A = np.array(nx.attr_matrix(G, node_attr='name')[0])
X = np.array(nx.attr_matrix(G, node_attr='name')[1])
X = np.expand_dims(X, axis=1)
print('shape of A: ', A.shape)
print('\nShape of X: ', X.shape)
print('\nAdjacency Matirx (A):\n', A)
print('\nNode Features Matirx (X):\n', X)
AX = np.dot(A, X)
print("Dot product of A and X (AX):\n", AX)
Degree of undirected graph
Normalized $\tilde A$
Adding $I$ is to add self-connecting edges
Considering neighboring nodes in the normalized weights
Now we consider a feature matrix $X$
Message passing framework
Message passing with self-loops
Neighborhood normalization
The most basic neighborhood aggregation operation simply takes the sum of the neighbor embeddings. One issue with this approach is that it can be unstable and highly sensitive to node degrees.
One solution to this problem is to simply normalize the aggregation operation based upon the degrees of the nodes involved. The simplest approach is to just take an average rather than sum
Finally Graph Convolutional Networks
$$ \begin{align*} H^{(k+1)} &= \sigma \left( \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2} \right)H^{(k)} \, W^{(k)}\right)\\ &= \sigma \left( \tilde A H^{(k)} \, W^{(k)}\right) \end{align*} $$G_self_loops = G.copy()
self_loops = []
for i in range(G.number_of_nodes()):
self_loops.append((i, i))
G_self_loops.add_edges_from(self_loops)
print('Edges of G with self-loops:\n', G_self_loops.edges)
A_hat = np.array(nx.attr_matrix(G_self_loops, node_attr='name')[0])
print('Adjacency Matrix of added self-loops G (A_hat):\n', A_hat)
AX = np.dot(A_hat, X)
print('AX:\n', AX)
Deg_Mat = G_self_loops.degree()
print('Degree Matrix of added self-loops G (D): ', Deg_Mat)
D = np.diag([deg for (n, deg) in list(Deg_Mat)])
print('Degree Matrix of added self-loops G as numpy array (D):\n', D)
D_inv = np.linalg.inv(D)
print('Inverse of D:\n', D_inv)
DAX = np.dot(D_inv, AX)
print('DAX:\n', DAX)
D_half_norm = fractional_matrix_power(D, -0.5)
DADX = D_half_norm.dot(A_hat).dot(D_half_norm).dot(X)
print('DADX:\n', DADX)
Adding a non-linear function: 1st layer
$$ \begin{align*} H &= f \left(X, \tilde A \right) \\ & = \sigma \left(\tilde A X \, W \right) \end{align*} $$Adding a non-linear function: $\mathcal l^{\text{th}}$ layer
$$ \begin{align*} H^{(\mathcal{l}+1)} &= f \left(H^{(\mathcal{l})}, \tilde A \right) \\ & = \sigma \left(\tilde A H^{(\mathcal{l})} \, W \right) \end{align*} $$np.random.seed(7777)
n_h = 4
n_y = 2
W0 = np.random.randn(X.shape[1], n_h) * 0.01
W1 = np.random.randn(n_h, n_y) * 0.01
def relu(x):
return np.maximum(0, x)
def gcn(A, H, W):
I = np.identity(A.shape[0])
A_hat = A + I
D = np.diag(np.sum(A_hat, axis=0))
D_half_norm = fractional_matrix_power(D, -0.5)
eq = D_half_norm.dot(A_hat).dot(D_half_norm).dot(H).dot(W)
return relu(eq)
H1 = gcn(A, X, W0)
H2 = gcn(A, H1, W1)
print('Features Representation from GCN output:\n', H2)
def plot_features(H2):
x = H2[:,0]
y = H2[:,1]
size = 1000
plt.scatter(x, y, size)
plt.xlim([np.min(x) * 0.9, np.max(x) * 1.1])
plt.ylim([-1, 1])
for i, row in enumerate(H2):
str = "{}".format(i)
plt.annotate(str, (row[0], row[1]), fontsize=18, fontweight='bold')
plt.show()
plot_features(H2)
#importing dependencies
import numpy as np
import os
import networkx as nx
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle
from sklearn.metrics import classification_report
from spektral.layers import GraphConv
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout, Dense
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import TensorBoard, EarlyStopping
import tensorflow as tf
from tensorflow.keras.regularizers import l2
from collections import Counter
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
labels = np.load('./data_files/cora_labels.npy')
nodes = np.load('./data_files/cora_nodes.npy')
edge_list = np.load('./data_files/cora_edges.npy')
X = np.load('./data_files/cora_features.npy')
N = X.shape[0]
F = X.shape[1]
print('X shape: ', X.shape)
print('\nNumber of nodes (N): ', N)
print('\nNumber of features (F) of each node: ', F)
print('\nCategories: ', set(labels))
num_classes = len(set(labels))
print('\nNumber of classes: ', num_classes)
In case of GCN, it is possible to train with small number of data compared to supervised method because it is semi-supervised method. Therefore, in this task, train dataset will consist of 20 data for each class. Likewise, execute validation and test with 500 validation dataset and 1000 test dataset.
def encode_label(labels):
label_encoder = LabelEncoder()
labels = label_encoder.fit_transform(labels)
labels = to_categorical(labels)
return labels, label_encoder.classes_
labels_encoded, classes = encode_label(labels)
label_counter = np.zeros((num_classes, 1))
train_idx = []
for i in range(len(labels_encoded)):
label = np.argmax(labels_encoded[i])
if label_counter[label] < 20:
train_idx.append(i)
label_counter[label] += 1
rest_idx = [x for x in range(len(labels)) if x not in train_idx]
val_idx = rest_idx[:500]
test_idx = rest_idx[500:(500 + 1000)]
train_mask = np.zeros((N,), dtype=bool)
train_mask[train_idx] = True
val_mask = np.zeros((N,), dtype=bool)
val_mask[val_idx] = True
test_mask = np.zeros((N,), dtype=bool)
test_mask[test_idx] = True
print("All Data Distribution: \n{}".format(Counter(labels)))
print("\n")
print("Training Data Distribution: \n{}".format(Counter([labels[i] for i in train_idx])))
print("\n")
print("Validation Data Distribution: \n{}".format(Counter([labels[i] for i in val_idx])))
print("\n")
print("Test Data Distribution: \n{}".format(Counter([labels[i] for i in test_idx])))
G = nx.Graph(name='Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)
print('Graph info: ', nx.info(G))
A = nx.adjacency_matrix(G)
I = np.eye(A.shape[-1], dtype=A.dtype)
A_hat = A + I
degree = np.array(A_hat.sum(1))
D_half_norm = np.power(degree, -0.5).flatten()
D = np.diag(D_half_norm)
print('D:\n', D)
DAD = D.dot(A_hat).dot(D)
print('\nDAD:\n', DAD)
DAD = np.array(DAD, dtype=np.float32)
X = np.array(X, dtype=np.float32)
channels = 16
dropout = 0.5
l2_reg = 5e-4
learning_rate = 1e-2
epochs = 100
es_patience = 10
X_in = Input(shape=(F, ))
fltr_in = Input((N, ), sparse=True)
dropout_1 = Dropout(dropout)(X_in)
graph_conv_1 = GraphConv(channels,
activation='relu',
kernel_regularizer=l2(l2_reg),
use_bias=False)([dropout_1, fltr_in])
dropout_2 = Dropout(dropout)(graph_conv_1)
graph_conv_2 = GraphConv(num_classes,
activation='softmax',
use_bias=False)([dropout_2, fltr_in])
model = Model(inputs=[X_in, fltr_in], outputs=graph_conv_2)
optimizer = Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
weighted_metrics=['acc'])
model.summary()
validation_data = ([X, DAD], labels_encoded, val_mask)
model.fit([X, DAD],
labels_encoded,
sample_weight=train_mask,
epochs=epochs,
batch_size=N,
validation_data=validation_data,
shuffle=False,)
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
def plot_confusion_matrix(y_true, y_pred, classes,
size=(10, 10),
fontsize=20,
cmap=plt.cm.Blues):
plt.rc('font', size = fontsize)
cm = confusion_matrix(y_true, y_pred)
fig, ax = plt.subplots(figsize=size)
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
ax.set(xticks=np.arange(cm.shape[1]),
yticks=np.arange(cm.shape[0]),
xticklabels=classes, yticklabels=classes,
title='Confusion matrix for test dataset',
ylabel='True label',
xlabel='Predicted label')
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
fmt = 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, format(cm[i, j], fmt),
ha="center", va="center",
color="white" if cm[i, j] > thresh else "black", fontsize=20)
fig.tight_layout()
plt.show()
X_te = X[test_mask]
A_te = DAD[test_mask,:][:,test_mask]
y_te = labels_encoded[test_mask]
y_pred = model.predict([X_te, A_te], batch_size=N)
plot_confusion_matrix(np.argmax(y_te, axis=1), np.argmax(y_pred, axis=1), classes, fontsize=15)
layer_outputs = [layer.output for layer in model.layers]
activation_model = Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict([X,DAD],batch_size=N)
x_tsne = TSNE(n_components=2).fit_transform(activations[3])
def plot_tSNE(labels_encoded,x_tsne):
color_map = np.argmax(labels_encoded, axis=1)
plt.figure(figsize=(10,10))
for cl in range(num_classes):
indices = np.where(color_map==cl)
indices = indices[0]
plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label=cl)
plt.legend()
plt.show()
plot_tSNE(labels_encoded,x_tsne)
%%html
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
Libraries for TensorFlow: GraphNets, Spektral, DGL (or https://vermamachinelearning.github.io/keras-deep-graph-learning/)
University cources:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')